Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

affinity propagation: add comparison test with sklearn implementation #132

Merged
merged 3 commits into from
Nov 20, 2018
Merged

affinity propagation: add comparison test with sklearn implementation #132

merged 3 commits into from
Nov 20, 2018

Conversation

xiuliren
Copy link
Contributor

see issue:
#131

@xiuliren xiuliren changed the title add comparison with python results to illustrate the difference with python add comparison test with python affinity propergation Nov 19, 2018
@alyst alyst changed the title add comparison test with python affinity propergation affinity propagation: add comparison test with sklearn implementation Nov 19, 2018
@alyst
Copy link
Member

alyst commented Nov 19, 2018

Thanks for your work! Am I right that by "python comparison" you mean sklearn implementation?

I'll try to reimplement it if I find some time.

You can try it, but keep in mind that there should be clear benefits for the package to switch to it:

  • correctness (produce correct results where the current one fails)
  • performance (be faster or use less memory than the current implementation)

Actually, you would already need to have some tests, when/if you will be writing your implementation of the method.

The current state of affprop tests in master is obsviously unsatisfactory, because there are no tests for the correctness of generated clusters.
What I had in mind was some series of very simple tests:

  • one 2D point case
  • two 2D points cases (distance below or above the tolerance)
  • three 2D points case
  • 10 2D points case

In all these test cases the point coordinates should be fixed (not randomly generated), so the "human eye" can easily see the expected result.
If one of these simple tests fails, it should be relatively easy to find the cause.

The current test of 500 random points in 10 dimensions should come the last.
I have some questions to your comparison with sklearn results:

  • how did you generate python_assignments? Have you saved the Julian S matrix and loaded it into Python?
  • the clusters generated by "python" and affprop() could have different labels (assignments), e.g. the very same set of elements may be in cluster #11 in "python" and in cluster #95 in "Julia". Do you check for that?

@xiuliren
Copy link
Contributor Author

xiuliren commented Nov 20, 2018

good point, I just tried computing rand index for validation. Current implementation is correct. We can still merge this PR to keep the test though?

here is what I use for python implementation:

using PyCall

@pyimport sklearn.cluster as cl
af = cl.AffinityPropagation(affinity="precomputed")[:fit](similarityMatrix)

labels = af[:labels_] .+ 1

@alyst
Copy link
Member

alyst commented Nov 20, 2018

@jingpengw Thanks for checking this! I think it makes sense to merge with minor corrections:

  1. Include your PyCall script as a block comment (#= =#) before python_assignments with the comment that this is how the vector of assignments is generated: if later the test is modified, we would know how to update the expected output
  2. Rename pythons_assignments to ref_assignments, remove labels var; I guess it's not used/needed.
  3. Split the ref_assignments vector into several lines not exceeding 80 chars
  4. Remove all println() calls: they generate noise in CI, and if the randindex() test will fail, there would be actual-expected error log anyway

@xiuliren
Copy link
Contributor Author

updated, please take a look.

@alyst alyst merged commit da391f4 into JuliaStats:master Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants